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f^^ ' Abstract. Several variations of the Shiryaev-Roberts detection procedure in the context of the 

*vj ' simple changepoint problem arc considered: starting the procedure at Rq = (the original Shiryaev- 

Roberts procedure), at Rq = r for fixed r > 0, and at Ro that has the quasi-stationary distribution. 
^~>, Comparisons of operating characteristics are made. The differences fade as the average run length to 

C^ ■ false alarm tends to infinity. It is shown that the Shiryaev-Roberts procedures that start either from a 

specially designed point r or from the random "quasi-stationary" point are third-order asymptotically 
optimal. 
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AMS subject classifications. 62L10, 62L15, 60G40 

. , 1. Introduction. The simple changepoint problem posits that one obtains a se- 

i-G ' ries of observations Xi, X2, ■ ■ ■ such that {Xi}i^i are independent and. for some value 

C^ , ly, ly ^ (the changepoint), Xi, X2, . . . , X^ have known density / and X^^i,X^^2, ■ ■ ■ 

have known density g {v = 00 means that all observations have density / and v = Q 

means that all observations have density g). The changepoint v is unknown, and 

the sequence X = {Xi}i^i is being monitored for detecting a change. A sequential 

detection policy is defined by a stopping time T (with respect to the X's), so that 

Qv ' after observing Xi, X2, . . . , X^ it is declared that apparently a change is in effect. 

^S| , By Pi, we denote the probability measure generated by the observations X when 

the changepoint is ly and E^ stands for the corresponding expectation. The notation 

v = cxD. Poo and Eqo correspond to the no-change scenario. In other words, under 

10 ' Poo the observations {Xi}i^i are i.i.d. with density / and under Pq the observations 

^^ , {Xi}i^i are i.i.d. with density g (both with respect to a dominating measure A). 

Common operating characteristics of a detection policy T are EooT, the aver- 
age run length (expected time) to false alarm (assuming there is no change), and 
supo<y<oo Ej/(T — i^|T > ly), the maximal expected delay to detection. Subject to 
a lower bound 7 on Eoo^, the goal is to minimize the maximum expected delay. 
^ ' Note that a uniformly optimal procedure that minimizes the expected detection delay 
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E^(T — i/|r > v) for all ly ^ over stopping times with E^qT ^ 7 does not exist, and 
we have to resort to the niinimax setting. 

In 1961, for the problem of detecting a change in the drift of a Brownian motion, 
Shiryaev introduced a detection procedure that is a hmit of the Bayes procedure when 
the parameter of the exponential prior distribution tends to zero; see Shiryaev [12, 
13]. In particular, certain optimality properties of this procedure were established by 
Shiryaev [13]. In discrete time, a similar procedure was first considered by Roberts [11] 
as a particular case of the Girschick-Rubin [1] Bayesian procedure by setting the 
parameter of the geometric prior distribution to zero. Therefore, this procedure is 
usually referred to as the Shiryaev-Roberts procedure. The Shiryaev-Roberts (SR) 
procedure and its modifications are the centerpiece of this paper. 

Specifically, let A^ = g(Xi)/ f{Xi) denote the likelihood ratio for the observation 
Xi, and define 

n n 

k—l i—k 

TA = mi{n^l:Rn^A}, (1.2) 

where A is a positive threshold that controls the false alarm rate. For a connection 
between A and the expected time to false alarm EooTa see Pollak [7]. When defining 
stopping times we always assume that inf{0} = cx), i.e., T^ = 00 if i?„ never reaches 
level A. Note the recursion 

Rn+i = (1 + i?„)An+i, n > 0, i?o = (1.3) 

(with the null initial condition). 

Pollak [6] tweaked the procedure by starting it off at a random Rq whose dis- 
tribution is the quasi-stationary distribution Qa of the SR statistic i?„, defined by 

Qa{x) = lim Poo(ii'„ ^x\Ta> n), (1.4) 

and showed that the stopping time 

T^-^ = inf{n ;? I: R^^ ^ A}, (1.5) 

where 

i?Q^i = (l + i?Q^)A„+i, n>0, R^^^Qa, (1.6) 

minimizes the maximal expected delay sup^>QE,y(r — j/[r > v) asymptotically as 
7 —7- 00 to within o(l) over all stopping times that satisfy E^oT ^ 7, where A is such 
that EooTa = 7- We will refer to this randomized SR procedure as the Shiryaev- 
Roberts-PoUak (SRP) procedure. 

Usually, Qa{x) cannot be expressed in a closed form (except in some rare cases). 
To compute Qa{x) and make the SRP procedure implementable, Moustakides et al. [4] 
proposed a numerical framework. 

Until recently the question whether the SRP procedure is exactly optimal (in the 
class of procedures with EooT ^ 7) was an open question. Moustakides et al. [4] 
present numerical evidence that there exist procedures that are uniformly better. 
They regard starting off the original SR procedure at a fixed (but specially designed) 
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Rq — r, !^ r < a and defining tlie stopping time witli this new deterministic 
initialization 

T^ = mi{n ^l:Rl^ A), A> 0, (1.7) 

where 

i?,';+i = (l + KJA„+i, n^O, Rl = r. (1.8) 

They show by numerical examples that, for certain values of r, apparently ^^{T'^^ — 
iy\T^^ > v) < E^iT^"" - i/lT^-^ > i^) for all z/ > 0, where A^ and A are such that 
Eoo?"^'* = Eoo^A^ (although the maximal expected delay is only slightly smaller for 
Tj^J- We will refer to the procedure defined in (1.7) and (1.8) as the SR-r procedure. 
In [4], it is conjectured that the SR-r procedure with a specially designed r = r(7) is 
third-order asymptotically optimal (i.e., to within o(l)) in the class of procedures with 
EooT 5^ 7 as 7 — > cx). Examples where the SR-r procedure is strictly minimax arc pro- 
vided by Polunchcnko and Tartakovsky [10] and Tartakovsky and Polunchenko [14]. 

Shiryaev [12, 13] showed for Brownian motion that if a change takes place after 
many successive applications (re-runs) of a stopping time T (to a sequence Xi, X2, . . . , 
starting anew after each false alarm), then the expected delay is minimized asymp- 
totically as J^ ^- 00 (i.e., in a stationary mode) over all multi-cyclic procedures with 
EooT ^ 7 for every 7 > 1 by the original (multi-cyclic) SR procedure. PoUak and Tar- 
takovsky [9] showed the same for discrete time. 

The goal of the present paper is to answer questions regarding comparisons be- 
tween the various SR-type procedures introduced above - the SR, SR-r, and SRP pro- 
cedures. Is the stationary expected delay of the repeated SR procedure described in 
the previous paragraph similar to lim^^oo E^{Ta — v\Ta > v)"! (Yes, see Theorem 3.2, 
Theorem 3.3 and Corollary 3.1.) What can be said about the maximal expected de- 
tection delays of these detection procedures? (The SRP procedure and the SR-r 
procedure with a specially designed r are third-order asymptotically minimax, i.e., to 
within a negligible term o(l) — !• 0. Sec Theorem 3.4. This answer justifies the conjec- 
ture of Moustakides et al. [4].) What can be said about lim^^.oo E.jj{Ta — v\Ta > v), 
lim^^oo E^(rj - v\Tl > ly), and lim^^oo E^(r^^ - iy\T^^ > v) when aU have the 
same average run length to false alarm 7? (The average delay to detection at infinity 
is the smallest for the original SR procedure Ta, but the difference between them is 
0(1) as 7 — )■ 00. See Theorems 3.5 and 3.4.) We conclude with a numerical example 
that illustrates these phenomena. 

2. Preliminaries and Heuristics. Recall that v denotes the unknown change- 
point which is identified with the last time instant under the nominal regime. Thus, 
conditional on ^ = fc, the joint density of the vector {Xi, . . . , X„) can be written as 

k n 

p(Xi,...,Xr,\v^k) = \{f(X,) W g{X,) (2.1) 

i=l i=k+l 

for any n ^ 1 and fc ^ provided that rii=fe+i ff(^i) ^ 1 whenever k ^ n. 

Given observations Xi,. . . ,X„, introduce the hypotheses Hk : v = k < n that 
the change occurs somewhere within this stretch of observations and Hco : i' = 00 
that there is no change. Clearly, the latter hypothesis is equivalent to the hypothesis 
H^, V ~^ n. According to (2.1), the likelihood ratio of these hypotheses is 



p(Xi,...,X„|gfc) 



HA. 
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where Aj — g(Xi)/ f{Xi). Therefore, the SR statistic (1.1) can be interpreted as the 
average likelihood ratio averaged over a uniform improper prior distribution of the 
changepoint. 

By T we denote a generic stopping time (or a detection procedure) and by 
C^ = {T: Eoo^ > 7} the class of detection procedures (stopping times) for which 
the average run length (ARL) to false alarm does not fall below a given number 
7 > 1. 

The following two objects will be of the main interest in this paper: Supremum 
Average Delay to Detection 

SADD(r) = sup E^{T - iy\T > ly) 

and the limiting value of the average detection delay which we will refer to as Average 
Delay to Detection at Infinity 

ADDoo(T) = lim E^(r - i^\T > v). 

As we mentioned in the introduction, we are interested in a minimax setting 
of minimizing the maximal expected delay SADD(T) over stopping times with the 
lower bound on the ARL to false alarm Eoo^ J^ 7, i.e., in finding a procedure that 
would minimize SADD(r) in the class C^: infTec SADD(r) 1— !> Topt- However, in 
general we are unable to find an exact solution to this problem for every 7 > 1 
and, for this reason, we focus on asymptotic solutions for a large ARL to false alarm 
7; see Polunchenko and Tartakovsky [10] and Tartakovsky and Polunchenko [14] for 
examples where an exact minimax solution is available. 

Definition 2.1. We call the procedure To G C-y first-order asymptotically 
optimal if 

j.^ SADD(ro) ^ ^ 

7^ infrec, SADD(T) 

i.e., infrec^ SADD(T) = SADD(ro)(l + o(l)) where o(l) ^ as 7 ^ 00. 

We call the procedure To G C^ second- order asymptotically optimal if 

inf SADD(T) = SADD(To) + Oil) as 7 -^ 00, 

where 0(1) is hounded as j -^ 00. 

We call the procedure To G Cj third-order asymptotically optimal if 

inf SADD(r) = SADD(ro) + o(l) as 7 ^ 00, 

where o(l) tends to zero as ^ -^ 00. 

It follows from Pollak [6] that the SRP procedure (1.5) is third-order asymp- 
totically optimal whenever Eo[logAi| < 00. In Section 3.2 we prove the third- 
order asymptotic optimality property under the stronger second moment condition 
Eq |log Ai I < cx) using different techniques. The second moment condition allows us to 
obtain higher-order asymptotic approximations for SADD(T4^ ) and infj-GC SADD(T) 
(up to a vanishing term). Since the SRP procedure is an equalizer, i.e., E,^{T^^ — 
I'lT^^ > ly) does not depend on i/, it is sufficient to evaluate the average run length 
to detection EqT^^ assuming that the change is in effect from the very beginning. 
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More importantly, using the ideas of Moustakides ct al. [4] , we are able to design 
the initialization point r in the SR-r procedure (1-7), which may or may not depend 
on the false alarm constraint 7, so that this procedure is also third-order asymptot- 
ically optimal. In this respect, the average delay to detection at infinity ADDoo(T^) 
plays a critical role. To understand why, let us look at Figure 2.1 which shows the 
average detection delay Eu{T^ — v\T\ > v) versus v for several initialization val- 
ues r. This figure was obtained using integral equations and numerical techniques 
of Moustakides et al. [4]. For r = 0, this is the classical SR procedure whose aver- 
age detection delay is monotonically decreasing to its minimum that is attained at 
infinity (a steady state value). It is seen that there exists a value r = r* that gen- 
erally may depend on the threshold A for which the worst point v is at infinity, i.e.. 



SADD(r^ 



ADDoo(r^ ). This is a very important observation, since it allows us to 



build a proof of asymptotic optimality based on an estimate of ADDoo(T'Ji)- Particular 
choices of the "head start" r* will be discussed in the following sections. 
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Fig. 2.1. Typical behavior of the expected detection delay as a function of changepoint v for 
various initialization strategies. 



The monotonicity of the curve for the average detection delay of the SR proce- 
dure allows us also to conclude (intuitively only since this is only a numerical obser- 
vation and there is no theoretical justification of monotonicity) that the asymptotic 
lower bound for iniTec SADD(T) can be evaluated based on the value of ADDoo{Ta)- 

Asymptotically EqT^^, ADDoo(T'Ji'*), and ADDoo(rA) ai'c the same since the mean of 
the quasi-stationary distribution is of order 0(log A) and the values of the head start 



that lead to the almost optimal performance are either fixed (i.e., lim 



,) 



or go to infinity in such a way that r'^/A 
following study. 



as A ^- 00, as we will see from the 



3. Asymptotic Performance of the SR r and SRP Procedures. In this 
section, we discuss the asymptotic behavior of the SR-r and SRP detection procedures 
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for large values of the threshold A and the ARL to false alarm 7. 

3.1. Average Run Length to False Alarm. Let Zi = logAj denote the log- 
likelihood ratio for the i-th observation and let Sn = Zi + • ■ • + Z„. Litroduce a 
one-sided stopping time 

Ta = inf{n ^ 1: Sn ^ a}, a > 0. 

Let Ka = Sr^ — a be an overshoot (excess over the level a at stopping), and let 

C= lim Eo[e-'^"], x= lim Eo^a- (3.1) 

a— >oo a— fcxD 

The constants C and m: depend on the model and can be computed numerically. In 
general, < C < 1 and k > 0. 

Theorem 3.1. Assume that r ^ r* where r* is either fixed or, more generally, 
r* = r^ — > 00 m such a way that r\l A — > as A ^ 00. Then for the SR~r procedure, 
uniformly in ^ r ^ r^, 

£ooTl = {A/0{l+o{l)) as A -> 00, (3.2) 

where the constant Q is defined in (3.1). 
For the SRP procedure 

E^T^^ = {A/C){l + o{l)) as A ^00. (3.3) 



The proof of this theorem is given in the Appendix. 

Let i?oo denote a random variable that asymptotically as n — )■ 00 has the same 
Poo-distribution as R^, i.e., 

P(i?oo ^ a;) = lim P^Rl ^ x) := Q,t(x), 

n— >cxD 

where Qst (x) is called the stationary distribution of R^ . Recall also that we denote 
by Qa{x) = lim„_^oo Poo{Rn ^ a;|rj > n) the quasi-stationary distribution of R!^. 

We always assume that both quasi-stationary Qa{x) and stationary Qst (a;) dis- 
tributions exist, which is always true when Ai is continuous. 

Note that the process {Rl^ — 71 — r}„^o is a zero mean Poo-niartingale and, hence, 
applying the optional sampling theorem yields EocT^ = Eoo-Rtj; —r, which can be used 
to approximate E,^T^. Using the above fact along with Theorem 3.1, for practical 
purposes we suggest the following approximations 

EooTl ^A/C-r (3.4) 

and 

E^T^^^A/C-f^A, (3.5) 

where fj,A = Jq ^ dQA (x) is the mean of the quasi-stationary distribution. 

Note that the mean of the quasi-stationary distribution is of order O(log^) as 
A ^ 00. Indeed, by Kesten [2, Theorem 5], 

lim PooiRn > x) = 1 — Qst (-a;) ^ l/x as x — > 00, (3.6) 
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which along with the fact that Qa{x) ^ Qst(a^) (cf- Pollak and Siegmund [8]) yields 

rA rA 



l^A 



f [l-QA{x)]dx^ f [l-Q,t{x)]dx^O{\og 
Jo Jo 



A). 



(3.7) 



Since Qa{x) — > Qst(2;) as A ^ oo, it follows that ha = log^ + Ca, where Ca = 0(1) 
as A — > oo, so that 



fiA = logy4 + 0(l) as A 



(3.8) 



3.2. Average Delay to Detection and Asymptotic Optimality. We con- 
tinue with obtaining asymptotic approximations (as A -^ cx)) for the average delay to 
detection E^(rj — v\Tj^ > i^), including the case of the large changepoint i', i.e., for 
ADD(3o(T^), as well as with deriving an asymptotic lower bound for inf^gc SADD(T). 
This will allow us to ascertain whether the SR-r procedure with a certain initializa- 
tion r (which is either fixed or may depend on 7) is third-order asymptotically optimal 
as 7 — >■ 00. 

Recall that Zi = logAi is the log- likelihood ratio for the observation Xi, 5„ = 
X]j=i ^i ^^d >f is the limiting average overshoot in the one-sided test Tq = min{n ^ 
1: Sn> a} defined in (3.1). Let S'^ = Y.'i=i ^i ^^^ 1"^* ^j^.oo = Y.'^=i.+i s""^" • Let 



/ = EoZi = / log 






g{x)X{dx) 



denote the KuUback-Leibler information number. 

Lemma 3.1. Let Eq\Zi\'^ < cio and assume that Z\ is non- arithmetic. Let < 
Na < A be such that Na/{A''-~'^ log A) ^ 00 and Na = 0(^4/ log A) as A ^ 00 for 
some 5 6 (0, 1). Let r ^ and let TJ be defined as in (1.7). Then, as A ^ 00, 



E,(ri ^„\T^>„,Rl)^jnogA + ^- log(l + Rl) 



log 1 



K. 



l + Rl 



Tl>v,Rl 



(3.9) 



0(1), 



where o(l) — >■ as A -^ 00 uniformly on {Na ^ v < 00, R\, < A/Na, ^ r < cxd}. 

The proof of Lemma 3.1 is given in the Appendix. 

Remark 3.1. Let Vao = X^^i ^ '^^ ■ Note that Vi/^00 *s independent of R^, and 
has the same Pi, -distribution of all v ^ \, i.e., it is distributed as Voo under Pq. 

Recall that by i?oo we denote a random variable that has the Poo-hmiting (station- 
ary) distribution of i?„ as n — > cxo, i.e., Qst(a;) = lim„_).oo Poo(-Rn ^ x) — P(i?oo ^ x). 
Let 



Coo = E[l0g(l+i?oo+V;o)] = 



00 /"OO 



\og{l+x + y)dQ,t{x)dQ{y), (3.10) 



and 



a = E[log(l + r + Foe)] = / log(l + r + y) dQ{y), 

Jo 



(3.11) 



where Q(y) = Po(Kx, ^ y). 
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The following theorem, whose proof is based on Lemma 3.1 and can be found in the 
Appendix, provides asymptotic approximations (for large A) for the average delay to 
detection of the SR-r procedure (for large v and v = 0), and for the suprcmum average 
delay to detection SADD(rJ^) — EgT^'* of the SRP procedure (within vanishing 
terms o(l)). 

Theorem 3.2. // Eo|2^i|^ < cxd and Zi is non- arithmetic, then for any r ^ 

ADDoo(T^) = Eor5-^ = i[logA + x-Coo] + o(l) as A ^oo, (3.12) 



^aT2_ = j[logA + >t-Cr]+o{l) as A ^oo, (3.13) 



where o(l) — >■ as A -^ oo. 
Define 

j{T) = ^^=oE--(^-Hr>^)Poo(T>^) (3^^^^ 

The following lemma provides the lower bound for the supremum average de- 
lay to detection in the class C^. This bound will be used to obtain an asymptotic 
lower bound in Theorem 3.3 and for the proof of third-order asymptotic optimality 
of detection procedures in Theorem 3.4. 

Lemma 3.2. Let Ta be the stopping time of the SR procedure that starts from 
zero and let the threshold A = A^ be chosen so that EooTa = 7. The following lower 
bound holds: 

^nf^SADD(T)^ J(r^J. (3.15) 

Proof. Obviously, for any stopping time T 

Er^oJsup, E.(r - z^ir > i^)]PUT > k) 



SADD(T) 

^ Y.Zo^k{T-k\T>k)Poo{T>k) ^ ^^^^^ 

too-' 

As follows from Pollak and Tartakovsky [9], the right hand side is minimized by the 
SR stopping time Ta , so that 

inf SADD(T) ^ inf J{T) = J{Ta,) 

and the proof is complete. D 

The following theorem provides the asymptotic approximation for the lower bound 
J{Ta)- Its proof is given in the Appendix. 

Theorem 3.3. Let J{T) be defined as in (3.14) and Coo as in (3.10). // 
Eo|2'ip < 00 and Zi is non- arithmetic, then 

J{Ta)^ j{\ogA + M:-Coo) + o{l) as A -^00, 



where o(l) — >■ as A ^ 00. 
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Theorem 3.3 also allows for the following interpretation. Consider the following 
multi-cyclic detection procedure. Let a stopping time T be applied repeatedly after 
each alarm, so that Ti, T2, . . . are independent copies of T and Tj is the time interval 
between the [j ~ l)th and jth alarms. Clearly, the number i^j of false alarms before 
the changepoint v is 

£j, =max{i: ri + --- + 7^, sC J/}, (3.16) 

and the real change occurring at the point j/ + 1 is detected at the time 7f,^ = Ti + 
• • • + Tf^+i- For any fixed ^ 5^ 0, the average delay to detection of the multi-cyclic 
(repeated) detection procedure is Eiy(7£„ — v). Assuming that the change occurs at a 
far time horizon (i.e., v — ?► 00), introduce the stationary average detection delay 

STADD(r) = lim E^(7^„ - v). 

By applying renewal theory, it can be shown that STADD(T) — J{T). To this end, 
see PoUak and Tartakovsky [9]. Furthermore, the SR procedure is exactly optimal in 
the sense of minimizing the STADD. 

Therefore, the following corollary is a direct consequence of Theorem 3.2 and 
Theorem 3.3. 

Corollary 3.1. If Zi is non- arithmetic and Eo\Zi\'^ < cx), then, as A ^ oo, 

ADDoo(T^) = STADD(Ta) + o(l) 



STADD(r^) = i(log A + ^-Coo) + o(l). 

The following theorem establishes asymptotic optimality of the SRP and SR-r 
detection procedures under moderate conditions. Its proof is immediate from the 
above results. 

Theorem 3.4. Let Eg \Zi\ < oo and let Z\ he non- arithmetic. 



(i) Then 



inf SADD(r) ^ y [log(7C) -h >f - Coo] + o(l) as 7 ^ oo. (3.17) 

(ii) // in the SRP procedure A = A^ = 7^, then EooT^*^ — 7(1 -f o(l)) and 

SADD(T^-*) = j[log(7C) + >^-Coo]+o(l) 0.57^00. (3.18) 

Therefore, the SRP procedure is asymptotically third-order optimal in the class C-yi 
inf SADD(r) = SADD(r2-*) + o(l) as 7 ^ 00. 

(iii) // in the SR^r procedure A ~ Ay ~ 7^, and the initialization point r is either fixed 
or tends to infinity with the rate 0(7) and is selected so that SADD(T^) = ADDoo(T'J^), 
then EooT^ = 7(1 -f- o(l)) and 

SADD(T;) = y[log(7C) + >^-CJ+o(l) as 7 ^ 00. (3.19) 
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Therefore, the SR~r procedure is asymptotically third-order optimal: 
inf SADD(T) = SADD(r;i) + o(l) as j -^ oo. 

Proof. The asymptotic lower bound (3.17) follows from Lemma 3.2 and Theo- 
rem 3.3. The asymptotic approximations (3.18) and (3.19) (and statements of (ii) 
and (iii) in whole) follow from Theorems 3.1 and 3.2. D 

Remark 3.2. Third- order asymptotic optimality of the SRP procedure follows 
from Pollak [6] (under the sole first moment condition), so that this result is not new. 
However, higher-order asymptotic approximation for the SADD (3.18) is new. 

Remark 3.3. Feasibility of selecting r so that SADD(rj) = ADDoo{T^) fol- 
lows from numerical experiments performed by Moustakides et al. [4] as well as from 
the example in Section 4- See Figure 2.1 in Section 2 and Figures 4.2 and 4-3 in 
Section 4- 

Remark 3.4. Since for the SR procedure SADD(Ta) ~ EqTa, it follows from 
Theorem 3.2 (setting r — in (3.13)j that 

SADD(Ta) = -(logA + >^-Co)+o(l) as A ^oo, 

where Cq = Eo[log(l + V^cxs)]- Since A ~ Cj implies EooTa = 7(1 + o(l)), it follows 
that with this choice of threshold 

SADD{Ta) = j[\og{Cj) + K - Co] + o{l) as -f-^ 00. (3.20) 

Comparing (3.20) with the lower bound (3.17) shows that 

inf SADD(r) = SADD(Ta) + 0(1) as 7 ^ cx). 

Thus, the SR procedure is only second-order asymptotically optimal and the differ- 
ence is approximately equal to (Coo — Cq)/ I . This difference can be quite large when 
detecting small changes [i.e., when I is smalt). 

It is worth noting that Theorem 3.2 suggests that if the initializing point r = r* 
is selected from the equation Cp = Coo, then for the large ARL to false alarm 7 
the values of the average delays to detection at zero and infinity are approximately 
equal, Eo[TJ ] « ADDoo(T'Ji ) (to within small terms o(l)). This choice of the head- 
start is intuitively appealing since we intend to make the SR— r procedure look like an 
equalizer as much as possible. Obviously, the value of r* does not depend on 7, i.e., it is 
a fixed number that depends on the model. This observation will be further elaborated 
in Section 4. Note also that the fact that the limiting value lim^^oo r\ = r* equating 
Eo[Tj^ ] and ADDoo(T'^ ) to within o(l) is a fixed number has been first noticed by 
Moustakides and Tartakovsky [5] for the problem of detecting a change in the drift of 
a Brownian motion. Also, although starting at r* causes for a faster initial response 
than starting at r = 0, the resemblance to Lucas and Crosier's [3] FIR scheme is 
secondary: their method is designed to give a really fast initial response, whereas our 
goal is to attain asymptotic third-order optimality. 

It is interesting to ask how the average detection delays at infinity ADDoo(T'_a), 
ADDoo(2^^), and IKDDooiT^^) are related when all three procedures have the same 
average run length to false alarm 7. It turns out that the ADDqo is the smallest for 
the original SR procedure Ta- Theorem 3.5 below proves this statement. Note also 
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that by Theorems 3.2 and 3.4 the difference between ADD's of all three procedures is 
o(l) as 7 — ^ oo. 

This result can be proven in two steps: 1) To show that the ARL to false alarm 
EooTa^ of the SRP procedure is increasing in A, the threshold (the fact that the ARL 
to false alarm of the SR-r procedure is increasing in A for a fixed r is obvious); and 
2) To show that the average delay ADDoo(T^-'') of the SRP procedure is increasing 
in A (obviously, the ADD's at infinity are the same for all three procedures when the 
same threshold is being used). Since the SR procedure requires the lowest threshold 
to attain the same false alarm rate, this implies that the SR procedure has the lowest 
ADDoo. We believe that EooT^-* and ADDoo(r^-'') are both increasing in A in the 
general case. However, we are able to prove this fact only when the cumulative 
distribution function of log Ai is concave, both pre-change and post-change, something 
that guarantees monotonicity properties of the Markov detection statistics. This is 
restrictive, but it does hold, for example, in detection of a shift of a normal mean and 
in detection of a change of the parameter of an exponential distribution. It also holds 
for the example considered in the next section. 

For ?7 > 0; regard the sequence defined by the recursion 

Rii,= (v + Rl:^^)An+i, R^o'^^r. (3.21) 



To prove the required result we need the following lemma whose proof can be 
found in the Appendix. 

Lemma 3.3. Let F be a cumulative distribution function o/logAi that is log- 
concave [i.e., logF[x) is a concave function). Then the process (M„)„^o that has 
transition probabilities 

P(A/„+i s; x\M„ =t) = P (i?l"^i s; a;|i?i") = t, R^l^ < A 

is a stochastically monotone Markov process, i.e., P{Mn+i > x\Mn = t) is non- 
decreasing and right- continuous in t for all x. 

Remark 3.5. Note that the normal cumulative distribution function $(;^— ii) is 
log-concave, so the log-likelihood ratio of two normals whose means differ has a log- 
concave cdf The same applies to two differing exponential distributions as well as to 
two differing beta- distributions considered in Section 4. 

We are now prepared to state the desired result. The details of the proof are 
given in the Appendix. 

Theorem 3.5. Suppose that the cdf of log Ai is log-concave both pre-change and 
post-change. Let 1 < ^ < 00 be fixed, and let AL be such that the ARL to false alarm 
of the SR^r procedure T^-r = inf{n ^ 1: i?J^ ^ A'L} is 7. Then ADDoo(T'Jir) is an 
increasing function of r and 

min ADDoo(T>) = ADDoo(T°o) < ADDoo(tQ^), 

0^r<oo T T y 

where Aq is such that EooT^^ = 7. 

3.3. Computing Constants C,. and Coo- In order to implement the asymp- 
totic approximations we have to be able to compute the constants Cr and Coo defined 

by 

/•OO /'OO /'OO 

Cr= \og{l + r + y)dQ{y), Coo = / log(l + y + 2;) dQ3t(.T) dQ(2/), (3.22) 

Jo Ja Jo 
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where Qst(a;) = liin„_j.oo Poo(^ri ^ x) is the stationary distribution of the SR statistic 
Rn under Pqo and Q(y) ~ hni„_>.oo Po(V^n ^ y) is the hmiting distribution of V^ = 
J27=ie~^' under Pq. 

Assume that the distribution of Ai is continuous. Then for computing the 
constants we need to evaluate the two densities gst(2;) = dQstix)/dx and q{x) = 
dQ{x)/dx. Let i?oo and Voo be random variables that are the limit (in distribution, 
as n — >■ oo) of _R„ and Vm respectively, which have densities qst{x) and q{x). To find 
the desired densities, observe that, by recursion (1.3), i?oo and (1 + i?oo)^i have the 
same density qst{x) under Pqq. Similarly Voo and (1 + 14o)Aj"^ have the same density 
q{x) under Pq. To see this note that, by the i.i.d. property of the data, Vn has the 
same Po-distribution as the random variable Vn = Yll=i OiLi ^7^j which follows the 
recursion l/„ = (1 + Vi_i)A^^. Therefore, we have the following integral equations 
for these densities 



gst(2;) = / qst{y) 



dx \1 + y 



dy, qix) = - / q{y) 







9a; V X 



dy, 



where Foci{x) = Poo(Ai ^ x) and Fo{x) ~ Po(Ai ^ x) are the corresponding distribu- 
tion functions of the likelihood ratio Ai . 

Thus, qstix) and q{x) are the eigenfunctions corresponding to the unit eigenvalues 
of the linear operators defined, respectively, with the kernels 

d / X \ d f \ ^ y 

IC^{x, y) = 77-Foo — — - , /Co(a;, y) = -^Fq 



OX \1 + y J ox \ X 

The constants Cr and Coo are then obtained, usually by numerical integration. 

The next section offers a comparative performance analysis for an example where 
Cr and Coo are computable analytically. 

4. Accuracy of Asymptotic Approximations: An Example. To verify the 
accuracy of the asymptotic approximations, we carried out an extensive performance 
evaluation of the procedures discussed in the earlier sections for the following example. 
Suppose {Xn}n^i is a series of independent observations such that Xi, X2, . . . , X^ are 
beta(2, 1) each, and X^+i,X^+2, ■ ■ ■ are beta(l, 2) each. Put another way, the series 
undergoes a sudden and abrupt shift in the expected value from 2/3 pre-change to 
1/3 post-change, while retaining the variance. 

To be specific, our goal is to verify the conditions and the accuracy of the asymp- 
totic approximations given in Theorem 3.2, Theorem 3.4, and Remark 3.4, i.e.. 



(4.1) 



SADD(ri) « SADD(T«-*) « j [log A + x-Coo], 

SADD{TA)^j[\ogA + K-Co], 

and the approximations for the ARL to false alarm given in (3.4), i.e., 

EooTJi ^A/C-r and EooT^^ « A/C - MA, (4.2) 

where ma is the mean of the quasi-stationary distribution. 

To undertake this task, it is necessary to be able to calculate the constants Cr, 
Coo, Ci and K and also to compute the initialization point r and the mean ^a of 
the quasi-stationary distribution Q^. While usually the constants Cr and Coo can be 
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evaluated only numerically or by Monte Carlo, it turns out that for the beta-model 
considered these constants are computable analytically. 

The pre- and post-change probability densities for this scenario are 

f{x) = 2xll{o^x^i} and g{x) = 2(1 - a;)]l{o^x^i}, 

respectively, and the likelihood ratio for the nth observation is A„ = 1/X„ — 1. 
The quasi-stationary distribution satisfies the integral equation 

Qa{x) = ^J^ ^oo (^YT^) dQA(y), 

where Faoit) = Poc-(Ai ^ t) and 

Since for any t ^ 

Poo(Ai «; t) = 1 - Poo (x,, ^ ^^ = 1 - (1 + 1)-2, (4.3) 

which is continuously differentiable with respect to t, the equation for the quasi- 
stationary distribution QA(a;) for this model is 

X.QAx) = l-£^^^±^dQ.iy). (4.4) 

Note that the quasi-stationary distribution converges to the stationary distribu- 
tion Qa{x) -^ Qst(a;) as A — > oo (cf. PoUak and Siegmund [8]). 

Equation (4.4) cannot be solved analytically for an arbitrary finite A, but its 
limiting value when ^ ^^ oo (i.e., the stationary distribution of the statistic R^^) does 
permit a closed- form solution. By (4.4), the stationary distribution Qst(a;) satisfies 
the following equation 



/■oo 



^2 



^'^'^^ dQM 



+ x + yy 



and the solution is Qst(a^) = [^^/(l + 2;)]ll{a;^o}- 

To derive the equation for Q(a;), observe first that for t ^ 

Po(l/Ai s; i) = Po(Ai ^ 1/t) = Po (Xn < 



1 + t 



^-T—J = l-(l + i)"^ 



i + t V 1 + ^ 

which is identical to Poo(Ai ^ t). As a result, the distribution Q{x) satisfies precisely 
the same equation as Qst(2:) and, therefore, 

Q(x) = Q,t(x) = -^l{,^o}. (4.5) 
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Using (3.22) and (4.5), one is able to calculate the constants C^ and Coo exactly 



as 



a = i^ log(l + r) 



and Coo = -r 
6 



1.6449. 



(4.6) 



In particular, Co = 1. 

Note that the Kullback-Leibler information number / = 1, so that 



SADD(r^ 



SADD(r^' 



log A + >^- 1.6449, SADD(Ta) wlogA 



1. 



Unfortunately, neither the limiting average overshoot xr nor the limiting average 
"exponential" overshoot C, are computable exactly. Monte Carlo simulations with 
10^ trials have been used to estimate the two as >rr sa 1.255 and C, ~ 0.426 with the 
standard error less than 10""^. Specifically, these estimates were obtained from the 
formulas 



p C2 °° 1 



2Eo^: 



k=l 



1 1 

C=-exp -2^: 
I fe=i 



Po(^fc^O) + Poo(^fe>0) 



where Sk = '^j=i log Aj, S^ = min(0, 5*^), and/ = Eo[log Ai] (see, e.g., Woodroofe [15]). 
The first fraction in the first formula is computable analytically. The only issue is 
the infinite sum. To evaluate this sum it was first truncated at 10^. An extreme- 
value-theoretic argument shows that the weight of the dropped tail is of order of the 
machine precision. This makes it safe to assume that the sum of the first 10^ terms 
is effectively equal to the original infinite sum. The second source of errors is the 
expectations under the sum. These expectations are not computable analytically, 
and therefore Monte Carlo simulations were used: we generated 10^ trajectories of 
S'l, 5*2, . . . , ^loooooj and performed averaging across the trajectories to find Eo[5'jr] for 
each k ~ 1,2,..., 100000. The sum in the second formula was evaluated in a similar 
manner. 

Despite the fact that in the example considered the distributions Qst(a^) and 
Q{x) are obtainable exactly, neither the quasi-stationary distribution, required for the 
SRP procedure, nor the conditional average delay to detection E,j{T — v\T > ly) for 
ly ^ and the ARL to false alarm seem feasible to get analytically. To overcome this 
difficulty, these quantities were computed numerically, using the approach undertaken 
by Moustakides et al. [4] with the number of breakpoints set at 5 x 10*, high enough 
to ensure a relative error in the order of a fraction of a percent. 

Specifically, let j = {oo,0}. For ^ > 0, r ^ 0, and v ^ 0, define (f>j{r) = E^TJ, 
5^{r) = E^[(T^ - !/)+], p^{r) = PooiT^ > ly), and Fj{x) = Pj(Ai ^ x). Using the 
Markov property of the statistic R^ , the following integral equations and recursions 
for operating characteristics are obtained by Moustakides et al. [4]: 



<f>jir) 



5.{r) 



P^ir) 



1 



0j (x) 



dx ■' 



v{x) 



F^ 



' d_ 
dx 

Pv-i{x) -^T-Fo, 



dx. 



dx. 



0,00, 



i'^ 1, 



1 



dx, V ^ 1 



with the initial conditions (5o(r) = EqT^ 
equations yield the ARL to false alarm E(x 



= 0o('') and po(^) ~ 1- These integral 
T'a and the sequence of average detection 
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delays E^(TJ^ — z^jT^ > v) = S^{r)/p^{r), i/ ~ 0,1, . . . as functions of the starting point 
r ^ 0. The distribution function F^{x) is defined in (4.3) and Fo(x-) = [x/{l + a;)]^, 
x^O. 

In order to implement the SRP procedure T^^ as well as to evaluate its perfor- 
mance, we need to compute the quasi-stationary distribution Qa{x), which satisfies 
the integral equation (4.4). The ARL to false alarm EooT^^ and the average detection 
delay EqT^'* of the SRP procedure are then computed as 



E,r 



Qa 



J-^A 



A i-A 

Ej[T^]dQA{r)= 0,(r)dQA(r), j = oo,0. 



Q^ 



We recall that the SRP procedure is the equalizer, so that E^{T^^ — i^|T^"* > i^) 



EoT^-* for all z/ ^ 1. 



Finally, by Moustakides et al. [4], the lower bound J{Ta) given in Lemma 3.2 is 
computed as J{Ta) = ■i/'(0)/(^oo(0), where tjj{r) is the solution of the integral equation 



'^{r) = (l)o{r) + / ip{r) 
Jo 



dx 



1 



dx. 



The above integral equations allow us to compute numerically operating charac- 
teristics of both SR-r and SRP procedures as well as the mean of the quasi-stationary 
distribution ^a- 



zL 







. .^--K"' 


,/^ 




^^'^^^ 


f ^ 


^^^ 


\ 


/ / 


: ^ 


log (A) 


1 / 
If 

■ f 
if 

























1000 2000 3000 
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Fig. 4.1. The mean fiA of the quasi- stationary distribution Qa{x) as a function of the detection 
threshold A. The log(A) function is plotted to demonstrate that fij^ = log A H- 0(1). 



At this point the only unresolved question is that of how to choose r. To this end, 
several options are proposed by Moustakides et al. [4]; one of the options is r — fiA- 
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Recall that Theorem 3.4 requires (a) r — o{A) as A — >► oo and (b) SADD(r^) = 
ADDoo{Tj^). If r = /i^. then condition (a) is satisfied, since by (3.7) and (3.8), 
f^A ^ 0{logA) and fiA = ^ogA + 0(1). This is also illustrated in Figure 4.1, which 
shows that the inequality /ia ^ 0{logA) and the equality ha ~ log A + 0(1) are 
indeed satisfied. 

The condition (b) is also satisfied even for small values of the ARL to false alarm, 
as can be seen from Figure 4.2 which shows how Eiy{T~i/\T > v) evolves as v runs from 
to 10 for the SRP procedure and for the SR-r procedure with r — ^a- The ARL to 
false alarm is about 100 for both procedures. Observe that the SR-r procedure attains 
supremum at infinity, i.e., as v ^f oo. Also, the stationary regime kicks in as early as 
&i V = Q, and this is for Eao[T] ~ 100. In addition. Figure 4.2 illustrates Theorem 3.5 
- ADDoo(r) is indeed the smallest for the SR procedure, while the difference is small. 
We iterate that it is easily shown that the log-concavity conditions of Theorem 3.5 
hold in the example considered, i.e., log[Poo(log Ai ^ x)] and log[Po(logAi ^ x)] are 
concave functions. 

Table 4.1 provides values of the supremum average delay to detection SADD and 
the lower bomid J{Ta) versus the ARL to false alarm Eoo[T]. Also presented in paren- 
theses are the corresponding theoretical predictions made based on the asymptotic 
approximations (4.1) and (4.2). Its is seen that the approximations for the ARL to 
false alarm are fairly accurate even for small values of the ARL such as 50, while the 
approximations for the SADD and the lower bound become accurate for the moderate 
false alarm rate (ARL = 500 and higher). 
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Fig. 4.2. Results of numerical evaluation of the conditional average detection delay vs. change- 
point V of the SR, SRP and SR^r (r = fiA) procedures. The ARL to false alarm Eoo[T] ss 100. 
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Another possible way of starting the SR— r procedure is from the value of r for 
which the average detection delay at the point i^ = is equal (at least approximately) 
to the ADDoo (i.e., in the steady-state mode), as has been proposed in Section 3.2. 
In the asymptotic setting this is equivalent to finding a point r = r* for which Coo is 
equal to Cr- Clearly, r* is a fixed number that does not depend on A since Coo does 
not depend on r and A. Using (4.6), we obtain the transcendental equation 



1 



r* 



and the solution is r* 



1.98. 



3.6 



> 

A 
U < 

> 

I 

^1 <; 



w 
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Fig. 4.3. Conditional average detection delay vs. changepoint v for the SRP procedure and for 
the SR^r procedure with r = r* = 1.98. The ARL to false alarm Eao[r\ fa 100. 



Figure 4.3 shows the average delay to detection E^[T — v\T > v] versus the 
changepoint v for the SR— r procedure with r = r* = 1.98 and for the SRP procedure. 
Observe that for the SR— r procedure the average delay at i/ = is equal to that at 
infinity, as was planned. More importantly, the point i^ = is the worst (supremum) 
point (along with large v). Also, it can be seen that the SR— r procedure is uniformly 
(i.e., for all v '^ Q) better than the SRP procedure, although in this example the 
difference is practically negligible. We also note that this initialization is better than 
starting off at the mean of the quasi-stationary distribution, while the difference in 
performance is very small - the SADD is equal to 3.54 for r = ^a and 3.52 for r = r* . 
This allows us to conclude that the SR— r is robust with respect to the initialization 
point in a certain range. 



o 
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o 

o 
a. 
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Table 4.1 
Summary of the results of numerical evaluation of operating characteristics of the SR, SRP and SR-r procedures. Numbers in parentheses are computed 
using the asymptotic approximations. 



Test 


7 


50 


100 


500 


1000 


10000 


SR 


A 


21.0 


42.0 


212.0 


424.5 


4256.0 


ARL to false alarm 


50.412 (49.342) 


99.832 (98.684) 


499.866 (498.12) 


999.797 (997.415) 


9999.675 (10000.0) 


SADD 


3.407 (3.312) 


4.051 (4.005) 


5.622 (5.615) 


6.309 (6.308) 


8.607 (8.611) 


SRP 


A 


21.5 


43.0 


213.5 


426.5 


4259.0 


ARL to false alarm 


49.635 (48.48) 


99.664 (98.431) 


499.424 (497.595) 


999.87 (997.404) 


9999.81 (10000.066) 


SADD 


2.942 (2.668) 


3.534 (3.361) 


5.021 (4.97) 


5.692 (5.663) 


7.965 (7.966) 


SR-r 


A 


21.5 


43.0 


213.5 


426.5 


4259.0 


r = fJ-A 


2.037 


2.603 


4.052 


4.711 


6.982 


ARL to false alarm 


49.554 (48.48) 


99.582 (98.431) 


500.52 (497.595) 


999.792 (997.404) 


9999.735 (10000.066) 


SADD 


2.942 (2.668) 


3.534 (3.361) 


5.023 (4.97) 


5.692 (5.663) 


7.965 (7.966) 




Lower Bound 


2.939 (2.668) 


3.523 (3.361) 


5.017 (4.97) 


5.688 (5.663) 


7.965 (7.966) 
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5. Conclusions. We considered three different versions of the Shiryaev-Roberts 
procedure, with the difference being the starting point - the conventional SR pro- 
cedure, where Rq = 0, PoUak's modification of this procedure, where Rq is sam- 
pled from the quasi-stationary distribution of the SR statistic, and a generalization 
where i?o = ^ ^ is a specially designed deterministic number, proposed by Mous- 
takides et al. [4]. For each of the procedures we derived asymptotic formulas for 
operating characteristics when the threshold A is high and showed that asymptoti- 
cally when the ARL to false alarm is large the SR-r and SRP procedures are both 
asymptotically third-order minimax. We emphasize that third-order asymptotic opti- 
niality of the SR-r procedure has been established under the conjecture that the worst 
changepoint is at infinity, which is justified numerically for several examples, including 
the one considered in Section 4. Unfortunately, we have not been able to prove this 
conjecture analytically. In addition, we performed a comparative efficiency analysis 
of the detection procedures to verify the accuracy of the asymptotic approximations 
and demonstrated the proximity of the latter to the real values for a specific example. 
The results of numerical analysis allow us to conclude that (in the minimax sense) 
performance of the SR-r procedure that starts with the mean of the quasi-stationary 
distribution as well as with a point that equates the average detection delay at zero 
and infinity is almost indistinguishable from that of the SRP procedure. 

Acknowledgement. We are grateful to a referee for useful suggestions that have 
improved the presentation in the article. 



Appendix. Auxiliary Results and Proofs. 

Lemma A.l. Let Yi,Y2,... be i.i.d. with EYi = and 
= Yi + h y„. Then for all e > 



EYj2 = CT^ < oo. Let 



NP ( max \SJ > eN 

,l<n<N I N- 



-^0. 



Proof. Applying Doob's maximal submartingale inequality to the submartingale 



S'„, we obtain 



P(max|^„|^eiV) ^ ^^^2 



e^N 






^\1 

N 



{maxS„^eAf} 
- < JV 



First, it follows that 



P (max|S'„| ^eiV) «= 4l^ — 



-^0. 



Now, we show that 



f " 



{maxS„^e7V} 



N^oo 



-^0, 



which implies that P(max|S'„| > eN) = o(l/N) as A^ — > cxd, i.e., the desired result. 

By the second moment condition, E(5^/A^) = cr^ < c«. Hence, by the Central 
Limit Theorem, 



^ ]\[ law ^2 
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where ^ is a standard normal random variable. 
Finally, for any L < cxd we have 



' C2 

E I ^rrl{maxS„>eW} 

iV n^JV 



LAf H 



{max S„>eN} 
- < JV 



C2 

N 



-LA 



C.2 

N 



^ LP max5„ > eA^ 



o2 

N 



{maxS„>eN} 



C2 



€ 



La^ 



c.2 
2„2\ 



JV-s-oo 



cr^ - E (L A ev^) 



L— >-oo 



> 0-2(1 -1) = 0. 



Proof of Theorem 3.L It follows from Pollak [7] that for the SR procedure 

EooTa = (A/C)(1 + o(1)), A^oo. 
Since i?5^ = re^" + Rn ^ Rn , we have 

EooTl < EooTa = (A/C)(l + o(l)) for any r ^ 0. 
For some positive m, define 



M = inf {r 



^ m}. 



Observe that 



and 



Eoo-^A 



Eo,(ri;ri < M) + Eoo(Tl;Tl > M) 



(A.1) 



i?^.. ^RJ^^^r expjS'T- } ^ ^ - m on {T^ < Af }. 

Hence, TA-m ^ T"! on {TJ < A/}, which implies that 

E^(Ti;Ti < A/) ^ Eoo(r^-™;n < M). 

Therefore, we have the following chain of equalities and inequalities: 

EooT; = Eoo(Tl; Tl < M) + Eoo(ri; T^ ^ M) 

^ E^{TA-,n;n < M) + E^Tl-Tl ^ M) 

= E-ooTA-m + Eoc(r^ ^ Ta-tti'iTa ^ "^) 

^ EooT^-m + Eoc(7a ^ rA-m;7A-m > T"]^ ^ Af) 

= Eoo-tA-m ^ Eoc(-' A-m ^ ^A' ^ A-m > ^ A ^ -''i ) 

= EooTl4_„i - Eoo(rA-m - TaIT^A-™ > T"! ^ ^-'^)Poo(7A-rn > ^A ^ A/) 

> EooTa-to - EooTA-mPoo(A^ < Oo), 

where the last inequality stems from {T4_,„ > Tj^ ^ M} C {A/ < oo} (so that 
Poo{Ta_,„ > T^ ^ M} ^ P^{M < to}) and from 

Eoo(Ta-™ - TX\TA-,n >Tl^ M) ^ Eoo{Eoo [i?T^_„ - Tl\Tl] \TA~m >T^^ M} 
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Note that e'^" is a nonncgative Poo-martingalc with mean 1, so that 

Poo{M < oo) = Poo (infjri: e'^" ^ m/r} < oo) < r/m, (A.2) 

and we obtain 

EooT^^ EooTA-™(l-r/TO) = ^^(l-r/m)(l + o(l)) 

s (A. 3) 

= (A/C)(l - m/A)(l - r/m)(l + o(l)). 

Let r^ — > cxD and m ~ niA — > oo so that r\/mA ^ and viia/A — > (which can 
always be arranged). Then, uniformly in ^ r ^ r^, 

which along with the reverse inequality (A.l) proves asymptotic equality (3.2) when- 
ever r\ — o{A) as A — > oo, and if r* does not depend on A the result obviously 
holds. 

Similar to (A.2), 

Poo{.M < oo\R^^ =x) < x/mA. 

Thus, for the SRP procedure, by conditioning on Rq , we obtain 

Poo(M < oo) = / Poo(M < oo\R^^ = x)dQAix) < — / xdQAix) = -^, 
Jo mA Jo ruA 

where ^a — E-Rg "^ = /„ xdQA{x) is the mean of the quasi-stationary distribution. 
By (3.7), [lAJmA ^ 0(m^^ log A) and to obtain (3.3) it suffices to take ttt-a = A^^'^ 
(say). D 

Proof of Lemma 3.1. For any v ^ 0, the SR-r statistic can be written as 

K^^^{i+K) n ^'+ n ^0 E n k'] 

Ciy-'rn \ / ;>+n— 1 

n ^0 1+^::+ E «" 
i=iy+l / \ ]=v+l 

Thus, we have 



e-'". 



+1 



iogi?;:+„ = Y. log A, + log 1 + i?:; + Y. 

i=u+l 

where K,n = Y.'jtu+i ^~^' ■ 

On {T\ > v} the stopping time T\ can be written as 

T; . i„f {,. , 1 : Jf Z. + log (l + ^) > log (^) j , (A^4) 
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Note that on {i?;; < A/Na}, 



log 



A 



> log 



/ ANa 



log A^A + 0(1) ^ cx) as A — !> 00. 



i + KJ "\Na + a^ 

Therefore, nonlinear renewal theory can be applied to the sequence 



Zv'+n 



E ^. + iog(i + Y 



i=iy+l 



Rl 



n^l. 



Note also that 



so that the sequence 



< log 1 



K,, 



l + K 



<log(l + K,«), 



log 1 



K,, 



i + Rr 



71 ^ 1 



satisfies the conditions of the nonlinear renewal theorem of Woodroofe [15, Theo- 
rem 4.5] uniformly in R^,. Indeed, the sequence {V,j,n}n^i is slowly changing and 

converges Pi,— a.s. (as n -^ 00) to the finite random variable Vi,,oo = J27Li,+i ^~^^ ■ 
The nonlinear renewal theorem yields the following asymptotic approximation: 



E,(Tl-z.|T^>t.,i?i;) = -<^ logA + >.-log(l + i?:) 



log 1 



K,. 



l + RZ 



Tl>v,Rl 



o{l), 



where o(l) ^- as ^ — >■ cx) uniformly on {Na ^ i' < oo,Rl < A/Na, ^ r < cx)}. 

Note that all the necessary conditions of this theorem hold trivially. The only 
condition that requires checking is the following: For some £ > 0, 

{logA)P^(T^-iys^erHogA\TA>v,Rl) >0 on {R^ < A/Na} ■ (A.5) 

A—>oo 

Let L = La^c = (1 -e)/"MogA, Pu{A,e) = P^{T^ sC i^ + L\Ta > ^.Rl), and 
BA,e — {Ta ^ v + La.e]- Changing the measure Poo i—^ P^,, we obtain that for any 
C > and e e (0, 1) 



>E. 



exp 
exp 



[-s'^f]t{s,^^}\n>v,R: 



{-'tT] 



IrH. c^ + i^r-i.lrj > I', i?J^ 



'{t 



,sz,V^c}\ 



> e 



~c 



pAt1^i^ + l\ti>v,r:)- 



pJm!,^S:Xn>C\Tl>u,Rl 
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Setting C — {1 + e)IL and noting that 

P. f max S:Xl > C\Tl > v, R^] = Po f max 5„ > c] , 

= Poo f inax <+„ ^ A\T^ > v, K^ 



Poo max Rl^„ ^ A\Rl 



we obtain 



where 



PuiA,e) i:a^{A,e)+P{A,e), 



a4A,e) = e'^+-')^^Poo max i?^,„ ^ A|i?n , ^(A,e) = Pq max Sn > {l+e)IL 

Note that {Rl,+n}n^i is a non-negative Poo-submartingale with mean Eoo(-RJ^+„|i?J^) 
n + i?J^. By Doob's submartingale inequahty, 



..(A,£) ^ (e(i+-)") ii±l^ = (e(i-^')i°8^) 



(i^£l^4^^A±^=o(l/logA) onK<A/iV^ 



A- 

It remains to show that P{A,e) = o(l/logA) as A ^ oo for some < e < 1. 
Note that 

l3{A,e) = Po ( max (5'„ - IL) > e/L W Pq ( max (S'„ - In) > elL 
By Lemma A.l, 



L Po max (5,1 - In) > elL > 0, 

yi^n^L J L^oo 

so that /3(A, e) = o(l/L) = o(l/logA). Thus, condition (A. 5) holds, the asymptotic 
approximation (3.9) foUows and the proof is complete. D 

Proof of Theorem 3.2. Obviously, ADDoo(T^) = EoT^-* for any fixed r ^ 
and any A > 0, so that it suffices to prove asymptotic expansion (3.12) only for 
ADDoo(r^). 

Write La = A/Na- Note that L^ = o(l/log^) as A ^ oo. Obviously, 



E: 



,(T^ - v\T\ >u)^ E,(r; - v\Tl >v,Rl< La)Poo{K < La\Ta > i^) + 
+ E.{Tl - v\Tl >y,Rl^ La)Poo{R: ^ La\TI > u) 
= E,(r^ - v\Tl >v,Rl< La) + (A.6) 

+ Poo(i?: > La\TI > v)£,{Tl -v\Tl> V, r: ^ La) - 
- Poo(i?: ^ LA\n > ^)E.(T^ - ^l^i > '^,K < La). 
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Note that for large enough A 

and there exists i'^ such that for all i' > iy\, hy Kesten [2, Theorem 5], 



(A.7) 



Poo(i?: ^ LA\r^ >iy)^ 2(1 - Qst{LA)) 



L. 



-(l + o(l))^0 asA-^cx) (A.8 



(when r ^ A first condition on R^, conditional on TJ > v), so that the second term 
in the last equality in (A. 6) is o(l). By Lemma 3.1, the first term in the last equality 
in (A. 6) is equal to 

E,(T^ - i.|Tl >iy,R:<LA) = jl log^ + >.- Eoo [log(l + i?^)m >iy,R:< La] 



E. 



log 1 



K.c 



l + RZ 



T^ >v,Rl< La 



o(l). 



Now, 



£^[\og{l + Rl)\Tl>u,Rl<LA] 

_ £oo [log(l + Rl)\T\ > H - Eoo [log(l + Rl)\Tl >v,Rl^ La] Poo{R: > La\TI > v) 

\-P^{RI-^La\T-j^>v) 

whereby (A.8) Poo(i?;; > La\TI > v) = o(l/logA). Also, since i?^ < ^ on {T^ > z/}. 



^ Eo, [log(l + R:)\T-^ >iy,R:> La] ^ log(l + A), 



so that 



Eoo [log(l + RDin >'^,K< La] = Eoo [log(l + RD^ > H 

, PooiK ^LA\T2>iy) 



l-PooiRl^LA\Tl>l^) 

E^[iog(i + i?:;)|ri>H + o(i) 



0(1) (A.9) 



(since Eoo[log(l + Rl,)\T^ > i'] < log(l + A)). By Remark 3.1, V^^oo is independent of 
Rl, and distributed as 14o under Pq. Since ADDoo(Ta) exists, it follows that 



ADDoo(Tl) - lim E,(T1 - iy\T^ > ly) 

= J jlog A + x-E [log(l + Roo)] - E 



log 1 



V^ 



l + Ro 



0(1) 



= J {log^ + X-E [log (1 + i?oo + Ko)]} + o(l) as A ^ cx) 

and the proof of (3.12) is complete. 

It remains to prove the validity of asymptotic approximation (3.13). Putting 
v = Q m. (A. 4), we obtain that the stopping time of the SR— r procedure can be 
written as 



Tl = inf {n ^ 1 : ^„ + log (1 + r + Va,n) > log A} , 
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where Vo^„ — X^fci e""^^ a-nd {S'„}„^o is the random walk with the drift / (under 
Po). The sequence {log(l + r + Vo.„)}„^i is slowly changing and converges Pq— a.s. 
to the random variable log(l + r + T4o) whose expectation is equal to Cr- The crucial 
condition 

(log A) Po (T^ sC er^ log A) > for some e > 

holds in an analogous way that yields (A.S). Therefore, nonlinear renewal theory can 
be applied to yield 



^o[T'A] = jilogA + x-Cr) + o{l) asA^oo, 



and the proof is complete. D 

Proof of Theorem 3.3. Recah that Na == o{A/ log A). We have 

^irr. . _ Er^o E.(rA - y\TA > ^)Poo(Ta > ly) 

J (J^A) — F — 7^ 

too J A 

_ E^^Q E.(Ta - i^\Ta > i^)Poo{Ta > v) , 



EooJa 

--Na + 1 



HILna+i ^i^i^A - y\TA > 1')Poo(Ta > v) 



EooTa 

Since E^oTa ^ A and since for a sufficiently large A, 

, iT^ ^ ^ p T^ / 2 log A 
supEi,(TA - v\Ta > v) ^ EoTa < — , 

for the first term, denoted as Ji. we have 

^ ^Na^^Ta ^ 2NA\ogA 

SO that J\ = o(l) as A ^' cx). 
Write 

DA^j^QgA + >i-Coo)- 
By Lemma 3.1, uniformly in N a ^ i' < cxd 

E,.(rA-z^|r4 >i^) =£'^ + 0(1) asA^cx). 
Therefore, for the second term (denote it as J-i) we have 



Ji 



E^=i DaPo.{Ta >i^)+ Y.7=Na+i[Da + o(1)]Poo(Ta > ^) 

EooTa 

Y.^=iDaPo.{Ta>i^) 
EooTa 

tooJA 
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where the last equahty follows inimediatcly from the fact that 

DaE^=i PooJTa > ^) ^ (21ogA)iV^ ^^ 

Etx)T/l lA A^oo 

where the inequality holds for a sufficiently large A. The required result follows. D 
Proof of Lemma 3.3. We must show that Poo(i?i+i ^ x\R^^^ = t, R^^l^ < A) is a 

non-increasing function of t. 
For X < A and > 0, 



Poo(Ai ^A/{j] + t)) 
F{logx-log{Tj + t)) 
F{\ogA-log{r, + t)) 
F{y - s) 
F(a-s)' 



(A.10) 



where we used the notation y — logx, s ~ log(ry + t) and a ~ log A. 

If logi^(x) is concave, then (logi^(x))" ^ 0, so (logF(a;))' = f{x)/F{x) is a 
non-increasing function of x. Therefore, we have 

^d/dW /^lo(r,)_./x oW ^ a\ _ 9 fF{y-s) 



^P (i?it ^ .|i?(") ^ t(.),i?iti <A)=f; 



ds V "+^ ^ ' " V /, ^„+i -^ y ds\F{a~s), 

-fiy - s)F(a - s) + Fiy - s)f{a - s) 



F{y~s) 
F{a - s) 



F{a-s) Fiy^s) 



^0, 



which implies that P{rII'1^ ^ 2:|i?„ = t, R^-li < ^) is a non-increasing function of t. 
D 

Proof of Theorem 3. 5. Let F denote the cdf of log Ai , and let P denote probability 
when F is the cdf of log Ai, when observations are iid. The following applies both for 
F defined by Xi ^ f as well as for F defined by Xi ^ g. Under the assumption of 
the theorem, F is log-concave. 

If To < ri, then i^Jj" < R^^ for all n ^ 0, so that {R^^} crosses A no later (and 
often earlier) than {Rn'}- Hence, EooT^ is a decreasing function of r and AL is an 
increasing function of r. (Note that this is true in general, with no assumption of 
log-concavity.) 

Now, note that ADDoo{T^) = ADDoo(r^-') = EoT^-* and, therefore, it suffices to 
show that EgT^'* and Ea^T^'^ are increasing functions of A. 

Let (A/,i )„^05 (-^^ri )n^Oi and {Mn )n^o be Markov processes governed respec- 
tively by 

P (Af W, < x\Mi'^ = t) = P (i?W , < x\RlP = t, i?W , < a) 
P [Mi% < .t|A/(2) = t) = p (i?W ^ <; ^\R(r) ^ t, Rill, < tjA) 
P {^M^l ^ x|M(^) = t) = P [rI^I < x|i?(") = t, R^:'l < ^a) , 

where rj ^ I and Rn' satisfies (3.21) with zero initial condition i?Q = 0. 
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Note that A/„ ' = rjMn , and therefore, M„ ' exits [0, A) at the same time that 
Mn exits [0,7]A) regardless of the distribution of A„. It foUows that Q)^ (x) = 

Q ]^{r]x), where Q]^ and QJ^^ are the Poo -stationary distributions of M„ and M„ ' , 
which are also the corresponding Poo-quasi-stationary distributions of the Markov 
processes i?„ ' and Rn' . 

Let ^ i < rjA. In a manner similar to (A. 10), 

V "+i ' " ' n+1 / y i^(logA-t-log77-log(77 + t)) 



F{arf - Sn) 

V "+1 ' " "+1 ' / F(logA + log77-log(l + t)) 

F[ar, - siY 
where y = logx, s^, = log(77 + t) and a^ = log A + log?;. Writing 

Fiy^s) 



S(s) = 



F{a^ ~ s)' 



we obtain that 



P (Mf^^i ^ x\Mi^^ = t,M^l, < r?A) = S(si). 

Since (by the same consideration as in the proof of Lemma 3.3) S(s) is non-increasing 
function of s, it follows that 

P (Aff^^i ^ x| A43) = t, mI^I, < r?A) s; P (^MJ^l ^ x\Ml^^ = t, M^^l, < r^A 

Now let ^ s ^ ^ < 7]A. By Lemma 3.3, Mn is stochastically monotone, so that 

P (^Mi% ^ a:|M(2) = s,Mi% < 7?a) > P (^Mi% ^ xjM^^) = t,Mi% < tjA) . 

(A.ll) 

Construct a sample space where A/q = Mq = 0. By (A.ll), M^ is stochas- 

tically larger than M^ ' . Hence, one can construct the probability space so that also 

(3) (2) 

M^ ^ M^ ' . Now, due to stochastic monotonicity of the transition probabilities, 

(3) (2) 

M^ is stochastically larger than M^ , and one can construct the probability space 
SO that also M2 ^ Afj . Continuing this inductively, one obtains a sample space 
where Af^^^ ^ M^^'' for all n ^ 0. Under Poo, A^n^^ and M^^^ tend in distribution 
to the quasi- stationary distributions Q^^ and Q^^, respectively, so it follows that 

QS ^ QS(^) for all X. 

"^"(3) 

Finally, consider the process AfA governed by 



(A/f^i s; x\Mi'^ =t)= P(Ai ^ x/{r^ + 1)) 
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and started at A/q ' ^ Q^^ and the process MA ' governed by 

P (Mi'l, ^ x|M(2) = t) = P(Ai < x/{l + 1)) 

— (2) (2) 

and started at A'/q ' ^ Ql^- In just the same way as above, we can construct a single 

— '(3) — '(2) — '(3) 

probabihty space with Af„ ^ MA for all n ^ 0. Therefore, the process MA will 

(2^ 

exit above rjA no later than the process MA , and therefore the expected exit time of 
AfA will not exceed that of MA . But these expectations arc ARL's to false alarm if 
Xi ^ f and ADDoo if Xi ^ g. Furthermore, the ARL to false alarm and the ADDqo of 
MA (where mA = 77"^ A/A ) are equal to those of AfA ■ Clearly, the first exit time 
of MJl^^ from [0, A) is nothing but the SRP stopping time T^-^ = inf{n: i?^-* ^ A}. 
Hence, it follows that both the average delay to detection EoT^"* (= ADDoo(Ta)) and 
the ARL to false alarm E^oT^-^ of the SRP procedure are increasing functions of A, 
and the proof is complete. D 
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